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The  state-of-the-art  in  decision  software  is  at  a  level  of  data  storage, 
display,  and  computation  as  an  aid  to  a  sophisticated  user.  Almost  certainly, 
the  emerging  generation  of  decision  software  will  be  designed  to  perform  a 
larger  range  of  analyst  functions.  We  have  focused  on  two  potential  problems 
challenging  the  computerization  of  decision  analysis,  and  on  assessing  the 
extent  to  which  these  problems  can  be  overcome.  First,  to  what  extent  can  the 
often  ill-defined  art  of  structuring  be  transfoimed  into  software;  and  secondly, 
to  what  extent  is  past  consumers'  satisfaction  with  decision  analysis  a  func¬ 
tion  of  the  formal  methods  and  procedures  of  the  theory  and  rationale  of  deci¬ 
sion  theory,  and  to  what  degree  do  other  factors  such  as  personal  interaction 
and  the  establishment  of  a  rapport  account  for  client  approval? 

We  compared  multiattribute  utility  analyses  of  personal  decision  problems 
of  undergraduates  performed  by  a  human  analyst  vs.  those  performed  by  a  "stand¬ 
alone"  software  package.  Multi  Attribute  Utility  Decomposition  [MAUD  3).  Al¬ 
though  subjects  overwhelmingly  yielded  more  favorable  reports  for  the  analyst 
session  than  for  the  MAUD  3  session,  agreement  with  and  acceptance 
of  the  analyst  and  MAUD  3  results  (implied  ordering  and  most  preferred  alter¬ 
native)  did  not  differ.  We  did  find  that  subjects  feel  better  taken  care  of 
when  more  attributes  are  included  in  the  analysis,  but  that  subjects’  holis¬ 
tic  ratings  are  better  accounted  for  by  analyses  with  smaller  rather  than 
larger  number  of  attributes. 

Although  the  analyst  attribute  sets  were  more  often  judged  more 
complete  and  better  in  overall  quality,  the  MAUD  3  attribute  sets  were  more 
often  judged  more  nearly  independent,  both  logically  and  valuewise.  Further¬ 
more,  the  attribute  set  of  each  pair  with  the  greater  nunber  of  dimensions 
was  overwhelmingly  chosen  as  being  more  complete,  less  independent,  and  of 
higher  quality  than  the  other  attribute  set.  Interestingly,  judgments  of 
overall  quality  were  virtually  identical  to  those  of  completeness. 

We  found  that  MAUD  3  is  not  truly  "stand-alone".  In  particular, 
our  subjects  needed  at  least  some  instruction  in  the  attribute  elicita¬ 
tion  phase  of  the  program.  We  also  found  that  most  subjects  are  unable 
to  answer  the  brlts  weighting  question  properly;  uninstructed  responses 
exhibit  a  sort  of  risk  aversion  that  renders  the  weights  virtually 
meaningless. 
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INTRODUCTION 

Computerized  decision  aids  have  become  an  indispensible  tool 
in  decision  analysis.  The  majority  of  the  early  aids  were  designed 
to  perform  such  functions  as  data  storage,  information  display, 
and  sensitivity  analysis.  For  example,  Decisions  and  Designs, 

Inc.  developed  several  aids,  largely  for  performing 
rapid  assessment  and  sensitivity  analysis  in  simple  decision  struc¬ 
tures  involving  multiattribute  alternatives  (EVAL) ,  uncertainty 
(DECISION) ,  or  both  (OPINT) .  Such  aids  typically  require  the  ser¬ 
vices  of  a  decision  analyst,  or  a  team  of  analysts,  including  one 
who  directs  the  execution  of  the  program.  The  primary  emphasis  of 
these  decision  aids  is  on  augmenting  the  efficiency  and  power  of 
the  analyst,  and  not  on  the  direct  automation  of  critical  analyst 
functions,  such  as  option  invention,  problem  structuring,  or  parameter 
elicitation. 

We  expect  the  development  of  decision  aids  to  go  in  two  directions. 
First,  the  aids  will  become  less  dependent  on  the  presence  of  a  deci¬ 
sion  analyst.  Second,  aids  will  be  tailored  to  more  substantive 
problem  areas  using  perhaps  generic  problem  structures  and  expertise 
or  data  bases  in  combination  with  standard  decision  analysis  methodology. 

The  computerization  of  decision  analysis  in  these  directions 
faces  numerous  problems.  The  purpose  of  the  present  paper  is  to 
focus  on  two  such  problems.  First,  most  of  what  goes  on  in  decision 
analysis,  especially  during  early  phases  of  option  generation  and 
problem  structuring,  is  more  accurately  described  as  "art"  than  as 
"science."  To  what  extent  can  this  often  ill-defined  art  be  defined 
by  precise  formal  algorithms  that  can  be  translated  into  software? 
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Secondly,  past  consumers  of  decision  analysis  have  expressed  both 


satisfaction  with  the  process  and  acceptance  of  the  conclusions  of 
analyses.  To  what  extent  is  this  satisfaction  and  acceptance  a 
function  of  the  formal  methods  and  procedures  embodied  in  decision 
theory  and  the  technology  of  decision  analysis,  and  to  what  extent 
do  other  factors  such  as  personal  interaction  and  the  establishment 
of  a  rapport  account  for  client  approval? 

We  will  first  discuss  empirical  research  and  speculation  bearing 
on  the  issues  of  problem  structuring,  option  invention,  user  satis¬ 
faction  of  decision  analysis  procedures,  and  user  acceptance  of 
decision  analysis  recommendations.  Following  this  brief  review, 
we  will  describe  an  exploratory  study  comparing  an  analyst- run 
decision  analysis  with  a  similar  one  conducted  by  computer. 

Option  Generation  and  Problem  Structuring 

Most  algorithms  for  option  generation  and  problem  structuring 
require  certain  parts  of  the  problem  to  be  given  (e.g.,  goals), 
and  then  derive  other  parts  (e.g.,  alternatives)  from  the  given 
structure.  Pearl  (Note  1)  first  proposed  a  procedure  for  option 
generation,  in  which  alternatives  are  sought  which  achieve  indivi¬ 
dual  goals,  while  being  unconstrained  with  respect  to  the  remaining 
goals.  Presumably  this  procedure  sets  the  option  generator/ decision 
maker  free  from  the  highly  constrained  situation  under  which  options 
are  usually  sought,  leading  both  to  a  larger  number  and  to  a  better 
quality  of  options.  Pearl,  Leal,  and  Seleh  (Note  ?)  recentlv  deve¬ 
loped  a  structuring  program  that  utilizes  this  method,  called 
GOal-Directed  DEcision  Structuring  System  (GODDESS). 
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In  a  recent  behavioral  study,  Pitz,  Sachs,  and  Heerboth 
(1980)  tested  Pearl's  "means-ends  analysis"  approach  to  option 
generation.  College  students  were  given  a  personal  decision 
problem  scenario,  and  asked  to  generate  as  many  reasonable  alter¬ 
natives  as  possible.  Three  groups  were  given  a  list  of  objectives 
(attributes)  and  asked  to  generate  alternatives  that  satisfied 
the  objectives  one  at  a  time,  two  at  a  time,  or  all  simultaneously. 
Two  groups  were  simply  given  example  alternatives  (either  categor¬ 
ized  or  randomly  displayed);  one  group  was  told  to  think  of  possible 
objectives  relevant  to  the  decision  problem,  and  one  group  was  told 
only  to  generate  alternatives.  Although  the  mean  group  differences 
were  small,  there  was  a  tendency  for  the  single  attribute  maximizers 
to  produce  more  reasonable  alternatives  and  for  the  multiple  attri¬ 
bute  maximizers  to  produce  fewer  alternatives. 

Gardiner  (1977)  proposed  the  notion  of  "decision  spaces"  as  an 
aid  for  the  general  problem  of  deciding  when  to  stop  looking  for 
more  alternatives.  Beta  probability  density  functions  (pdfs)  are 
constructed  on  each  attribute  to  model  the  distribution  of  available 
or  potential  options  over  that  attribute.  Using  the  MAIJ  model,  the 
single  attribute  pdfs  can  then  be  aggregated  into  a  single  pdf  over 
the  overall  value.  Such  a  pdf  puts  given  options  into  the  perspec¬ 
tive  of  past  or  future  options.  Alternatives  should  be  eliminated 
if  their  value  falls  in  the  lower  tail  of  the  distribution.  If  few 
or  no  alternatives  fall  in  the  upper  tail,  more  options  need  to  be 
generated.  In  addition,  improvement  of  options  is  likely  to  occur 
in  those  attributes  on  which  most  options  score  in  the  middle  or 
in  the  lower  tail  of  the  pdf.  Thus,  Gardiner  (1977)  suggests  formal 
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mles  for  deciding  how  to  eliminate  and  to  improve  options.  There 
has  been  no  behavioral  experiment  in  the  centext  of  decision  spaces. 

Leal  and  Pearl  (1977)  discuss  an  (unnamed!)  interactive  pro¬ 
gram  for  eliciting  problem  structures  in  the  form  of  action- event 
decision  trees.  The  software  employs  an  algorithm  for  expanding 
sensitive  portions  of  the  structure  and  pruning  insensitive  portions, 
based  on  provisional  intermediate  values.  In  this  case  the  structuring 
process  is  derived  from  a  specification  of  the  options  and  early 
skeletal  elements  of  the  structure. 

Humphreys  and  Iv'ooler  (Note  3)  describe  a  set  of  "structuring 
heuristics"  for  generating  objectives  and  goals  in  a  multiattribute 
evaluation  problem.  One  procedure,  implemented  in  a  computer  pro¬ 
gram  called  "Multi-Attribute  Utility  Decomposition"  (MAUD,  Humphreys  and 
Wisudha,  Note  4)  derives  attributes  from  a  predetermined  speci¬ 
fication  of  options.  The  user  is  asked  to  identify  the  opposite  poles 
of  dimensions  along  which  the  available  options  differ.  Humphreys  and 
McFadden  (1980)  tested  their  attribute  assessment  procedures  in 
several  applications,  concluding  that  "where  MAUD  was  able  to  aid  people 
it  did  so  through  the  reduction  of  goal  confusion,  and  through  con¬ 
sciousness  raising  about  the  structure  of  value-wise  importances  of 
attributes  possessed  by  choice  alternatives". 

Pitt,  Sachs,  and  Brown  (Noted  )  report  an  empirical  test  of  a 
technique  for  generating  problem  structures  and  options  simultaneously. 
.After  an  initial  cursory  listing  of  options  and  goals,  the  decision¬ 
maker  is  told  to  carefully  consider  each  goal  separately,  and  to 
try  to  identify  other  options  that  might  be  helpful  in  fulfilling 
individual  goals.  (This  part  is  similar  to  the  option  generation  pro¬ 
cedure  tested  by  Pitt,  Sadis,  and  Heerboth,  1980).  Next,  the  decision 


maker  is  told  to  consider  each  option,  including  the  newly  generated 
ones,  and  is  required  to  project  possible  outcomes  of  each.  The  de¬ 
cision  maker  is  asked  to  identify  additional  goals  that  would  be 
satisfied  by  projected  desirable  outcomes  and  undermined  by  undesi¬ 
rable  outcomes.  Pit:,  et  al.  (Note  5)  report  that  college  students 
faced  with  a  personal  planning  problem  produced  more  choices  and 
more  goals  with  this  procedure  than  with  three  alternative  methods. 

It  should  be  apparant  from  our  review  that  computeri:ation  of 
the  structuring  part  of  decision  analysis  is  still  in  its  infancy. 
The  basic  idea  of  fleshing  out  a  structure  by  building  on  initial 
skeleton  elements  and  using  decision  analytic  relations  seems 
promising.  So  far,  this  idea  has  been  mainly  used  to  formalize, 
the  derivation  of  options  from  goals,  and  the  derivation  of  attri¬ 
butes  fran  options.  It  is  conceivable  that  this  technique  could  be 
extended  to  hypothesis  generation  and  act-event  structuring  as  well. 
User  Satisfaction  and  Acceptance  of  Recommendations 

There  is  relatively  little  data  from  controlled  experiments  on 
our  second  problem  relevant  to  the  computerization  of  decision 
analysis:  user  satisfaction  with  the  process  of  decision  analysis 
and  acceptance  of  reccmmended  courses  of  action.  Fischhoff  (1981) 
speculates  on  sources  of  resistance  to  personal  decision  analysis: 

People  who  need  decision  analysis  may  reject  it  (1)  because 
they  are  personally  threatened  by  having  to  face  and  acknow¬ 
ledge  their  am  doubts  and  desires,  (2)  because  they  wish 
to  avoid  decision  analysis'  full  public  disclosure  requirement, 

(3)  because  they  feel  uncomfortable  and  incompetent  to  deal 
with  probabilities  and  multi-attribute  certainty  equivalents, 

(4)  because  they  are  afraid  to  innovate. 
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Presumably,  the  anonymit>"  of  computerized  decision  aids  could  prove 
to  be  a  major  benefit  with  regard  to  the  first  two  concerns  voiced 
by  Fischhoff.  The  absence  of  another  person  with  whom  to  ask 
questions  and  seek  reassurance  could  prove  to  be  a  disadvantage 
with  respect  to  his  last  two  concerns.  Presently,  we  have  no  good 
data  --  only  speculation. 

On  the  issue  of  user  acceptance  of  the  "best  alternative" 

suggested  by  a  decision  analysis,  Fischhoff  (1981)  states: 

Once  a  decision  analysis  has  been  perforated,  its  bottom- 
line  recommendations  may  be  rejected  because  they  are 
viewed  as  the  output  of  numerical  mumbo-jumbo  which  has 
no  intuitive  appeal  and  cannot  be  readily  justified  to 
superiors,  subordinates,  constitutents ,  etc. 

Fischhoff  is  clearly  focusing  on  the  issue  of  trust  here. 

Whether  a  computer  or  human  analyst  is  viewed  as  more  trustworthy 

will  certainly  vary  with  the  situation;  a  good  deal  of  the  variance 

is  probably  accounted  for  by  the  reputation  of  the  decision  aid 

(analyst  or  computer)  and  the  attitudes  of  the  decision  maker 

and  those  affected  by  the  decision.  In  a  personal  decision  context, 

much  would  depend  upon  the  decision  maker's  personal  knowledge 

of  the  aid  (either  analyst  or  computer)  gained  through  past  experience. 

Christen  and  Samet  (Note  6)  report  sane  provocative  data  on 

the  issue  of  decision  maker  acceptance  of  recommendations  arrived 

at  by  OPINT,  Decisions  and  Designs,  Inc.'s  computer  aid  for  the 

rapid  screening  of  decision  options.  In  their  laboratory  evaluation, 

experienced  naval  intelligence  analysts  were  presented  with  a 

background  scenario  and  intelligence  summaries,  and  required  to 

diagnose  enemy  military  plans  either  with  the  assistance  of  OPINT 

or  not.  A  set  of  "correct"  diagnoses  was  determined  independently  for 

each  stimulus  intelligence  report.  OPINT's  recommendations  to  aided 


officers  outperformed  the  unaided  officers  by  making  about  33%  more 
correct  decisions.  But  the  aided  officers  frequently  disagreed 
with  OPINT,  leading  to  essentially  equal  performance  between 
aided  and  unaided  officers.  Apparently  the  lack  of  confidence 
in  the  aid  produced  a  substantial  decrement  in  the  officers' 
performance.  Since  OPINT  requires  the  services  of  a  decision 
analyst,  many  questions  remain  unanswered.  In  particular,  would 
a  similar  decision  aid  administered  by  an  analyst  without  a 
computer  have  produced  more  or  less  rebellion  from  the  naval 
officers?  Would  a  user-oriented  stand-alone  version  of  OPINT  have 
instilled  more  or  less  confidence? 

The  Present  Experiment 

We  sought  to  directly  compare  multi- attribute  utility  analyses 
performed  by  an  analyst  and  by  a  computer  program.  We  selected 
MAUD  "3,  an  interactive  MAUA  program  "designed  to  work  in  direct 
interaction  with  the  decision  maker,  without  a  decision  analyst, 
counselor,  or  other  'expert'  as  intermediary" (Humphreys  and 
McFadden,  1980  ).  Each  college  student  subject  interacted  with  both 

an  analyst  and  MAUD  3  at  different  times .  The  experiment  afforded 
four  critical  comparisons  of  the  analyst  and  computer  sessions , 
related  to  differences  in:  (1)  final  recommendations,  (2)  corre-- 
spondence  of  final  recommendations  with  intuition  (holistic 
assessments),  (3)  the  number  and  quality  of  attributes,  and  (4)  stated 
satisfaction  with  the  process  and  confidence  in  the  results.  The 
repeated  observations  (within  subjects)  design  chosen  offers  the 
most  sensitive  tests  of  differences  between  the  MAUD  3  and  analyst 
interactions,  especially  with  regard  to  problem  structuring. 


All  decision  problems  were  multi-attribute  evaluation  problems 


generated  individually  by  our  subjects.  Although  Pits  and  his 
colleagues  have  performed  several  value  experiments  with  college 
students  demonstrating  the  viability  of  hypothetical  scenarios 
(e.g.,  roommate  difficulties,  Pitz,  Sachs,  and  Heerboth,  1980; 
apartment  choice,  Pitz,  Heerboth,  and  Sachs,  1980;  and  vacation 
plans,  Pitz,  et  al.,  Note  5),  we  elected  to  elicit  personal 
problems  that  were  currently  important  for  each  individual  subject. 
(Each  of  the  three  examples  above  was  proposed  by  at  least 
one  of  our  subjects.)  We  felt  that  questions  of  user  satisfaction 
and  confidence  could  only  be  addressed  in  a  real  context  tapping 
personally  relevant  values*. 

Since  MAUD  3  does  not  provide  any  mechanisms  for  option 
generation,  all  options  were  generated  prior  to  either  decision 
analytic  session.  So  as  to  increase  our  ability  to  detect  differences 
in  the  recommendations  of  the  two  analyses  and  their  correspondence 
with  intuition,  only  feasible,  highly-viable  (non-dominated)  alter¬ 
natives  were  allowed.  Finally,  in  order  to  maintain  maximum  sensi¬ 
tivity  to  differences  in  model  reconmendations,  we  sought  to  reduce 
random  judgmental  errors  by  requiring  that  the  subject  possess  a 
minimal  level  of  knowledge  of  the  proposed  choice  alternatives. 

METHOD 

Design  Overview 

Thritv-five  college  students  underwent  two  versions  of  multi¬ 
attribute  utility  analysis  in  two  experimental  sessions,  each  lasting 
from  1  to  3  hours.  The  complete  protocol  of  the  experiment  is  pre¬ 
sented  chronologically  in  Table  1. 
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Insert  Table  1  about  here 

All  subjects  identified  an  evaluation  problem  at  the  beginning  of 
the  first  session  that  (1)  was  personally  important  and  relevant, 

(2)  involved  four  or  more  viable  alternatives,  and  (3)  required 
information  that  was  readily  accessible.  Twenty- four  subjects 
interacted  with  the  computer  program  (MAUD  3)  during  the  first 
session,  and  with  one  of  five  human  analysts  during  the  second 
session;  the  remaining  eleven  subjects  first  interacted  with  a 
human  analyst,  and  then  with  MAUD  3.  Before  and  after  each  MAUA 
interaction,  subjects  provided  various  judgments,  including  (1)  "ho¬ 
listic  ratings"  of  the  choice  alternatives,  (2)  rankings  of  different 
vectors  of  alternative  ratings,  and  (3)  self-report  measures  of  the 
usefulness  of  the  MAUA  technique  used. 

Following  all  experiment  sessions,  each  subject's  pair  of 
attribute  sets  (MAUD  5  and  analyst)  was  presented  (blindly)  to 
three  of  the  five  analysts ,  along  with  a  generic  description  of 
the  corresponding  choice  alternatives  (e.g.,  "college  majors"). 
.Analysts  made  quantitative  judgments  concerning  the  completeness, 
logical  independence,  and  value  independence  of  the  attribute  sets, 
as  well  as  their  "overall"  or  "global"  quality. 

Subjects 

Sixty-seven  college  students  (31  females,  36  males)  enrolled 
in  an  introductory  psychology  course  at  the  Universitv  of  Southern 
California  were  interviewed.  Of  these,  35  (321)  were  able  to 
identify  a  multiattribute  evaluation  problem  that  met  the  require¬ 
ments  of  personal  relevance  and  accessible  information,  and 
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TABLE  1 

Experiment  Protocol 


Session  1 


Induction  Interview:  Problem  specification,  listing  of  alternatives, 
and  subject  screening 

Pre-MAUA  holistic  ratings  of  alternatives*  (HI) 

Interaction  with  MAUD  3  or  analyst:  Multiattribute  values  (A1  and  El) 
are  derived  from  assessed  weights  and  equal  weights,  respectively 

Post-MAUA  holistic  ratings  of  alternatives*  (H2) 

Self-report  ratings  of  the  interaction** 

Ranking  of  Session  1  alternative  ratings  sets  (HI,  H2,  and  Al;  also 
El  for  MAUD  3  session)*** 


Session  2  (approximately  one  week  later) 


Pre-MAUA  holistic  ratings  of  alternatives*  (H3) 

Interaction  with  MAUD  3  or  analyst:  Multiattribute  values  (A2  and  E2) 
are  derived  from  assessed  weights  and  equal  weights,  respectively 

Post-MAUA  holistic  ratings  of  alternatives*  (H4) 

Self-report  ratings  of  the  interaction** 

Ranking  of  Session  2  alternative  rating  sets  (H3,  H4,  A2;  also  E2  for 
MAUD  3  session)*** 

Forced  choice  between  most  preferred  set  of  alternative  ratings  from 
Session  1  and  from  Session  2 

Forced  choice  between  assessed  weight  model  composites  from  Session  1 
and  from  Session  2  (A2) 

Ordinal  judgment  of  superiority  between  MAUD  3  and  the  analyst  on  the 
self-report  items 


Debriefing:  Discussion  of  MAUD  3  and  analyst  procedures  and  discre¬ 
pancies  among  holistic  ratings  and  MAUA  recommendations 
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TABLE  1  (continued) 

*Each  subject  listed  his/her  choice  alternatives  frcm  most  preferred 
(assigned  an  anchor  of  100)  to  least  preferred  (anchored  at  0). 

Subjects  were  told  that  an  alternative  (X)  should  be  rated  50  if 
the  increment  in  desirabilitv  from  the  worst  alternative  to  X  was 
equivalent  to  the  increment  in  desirability  from  X  to  the  best  alternative. 

**Each  subject  rated  (from  1  to  10)  the  degree  to  which  he/she: 

(1)  had  discovered  new  aspects  of  the  problem  via  the  MAUA  interaction; 

(2)  felt  comfortable  during  the  interaction; 

(3)  thought  the  MAUA  had  helped  to  solve  the  oroblem; 

(4)  trusted  the  MAUA  to  recommend  the  "best"  alternative;  and 

(5)  would  desire  to  use  the  particular  MAUA  technique  for  future 
decision  problems. 

***Model  composite  evaluations  derived  fran  assessed  weights  (and 
equal  weights  for  MAUD  3  sessions)  were  normalized  to  the  same  0-100 
scale  as  the  holistic  ratings  by  subtracting  the  lowest  rating  fran 
all  ratings,  dividing  by  the  difference  between  the  lowest  and  the 
highest  rated  alternative,  and  multiplying  by  100.  Each  subject  rank 
ordered  the  three  sets  of  alternative  ratings  (four  sets  for  MAUD  3 
sessions)  in  terms  of  his/her  agreement  with  the  ratings  (and- implied 
orderings).  The  source  of  rating  sets  was  not  explicitly  identified. 
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included  at  least  four  viable  alternatives.  These  33  students 
(23  females,  13  males)  served  as  subjects;  the  rest  were  dismissed. 

All  67  students  received  credit  toward  a  course  requirement  pro¬ 
portional  to  the  number  of  hours  of  participation. 

Problem  Specification 

One  male  experimenter  conducted  all  6'  induction  interviews 
in  a  private  office,  each  lasting  fran  approximately  13  minutes  to 
lh  hours.  Each  interview  began  with  the  subject  reading  a  brief 
description  of  the  experiment,  outlining  the  purpose  of  the  two 
experimental  sessions.  Subjects  were  told  that  the  first  step 
would  be  to  identify  a  decision  problem  that  was  personally  important 
and  currently  relevant.  Hypothetical  choice  situations,  decisions 
that  had  already  been  made  and  acted  upon,  and  problems  whose  out¬ 
comes  had  no  clear,  direct  effect  on  the  subject  were  discouraged 
by  the  experimenter,  and  ultimately  rejected.  The  experimenter 
stressed  that  the  problem  should  involve  outions  with  distinctly 
positive  and  negative  aspects.  In  particular,  a  proposed  alternative 
to  a  decision  problem  was  rejected  if  the  subject  admitted  not  really 
knowing  very  much  about  the  alternative,  or  if  the  subject  felt  that 
the  alternative,  although  a  possible  course  of  action,  was  not  some¬ 
thing  he/she  could  envision  ever  really  doing. 

The  final  product  of  the  induction  interview,  for  the  35  subjects 
who  developed  choice  dilemmas  meeting  the  experimental  criteria,  was 
a  list  of  at  least  four  and  not  more  than  eight  well  defined  alternative 
courses  of  action.  Problems  included  choosing  among  majors  (at  USC)  (11'' 
colleges  to  which  to  transfer  (9) ,  places  to  live  (in  the  Los  Angeles 
area)  (6),  careers  (4),  travel  plans  (2),  automobiles  (1),  sports 
activities,  (1).  and  strategies  for  handling  a  roormate  difficulty  (1). 
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Most  of  the  32  rejected  subjects  identified  a  choice  dilemma  de¬ 
composable  as  an  MAU  evaluation  problem,  but  that  failed  to  meet  one 
or  more  experimental  requirements.  In  particular,  many  male  students 
were  reluctant  to  consider  as  many  as  four  viable  alternatives,  insisting 
that  they  had  "narrowed"  problans  down  to  only  two  (usually)  or  three 
alternatives.  .As  a  result,  the  rejection  rate  for  males  (64%)  was 
significantly  higher  than  that  for  females  (29?0  ,  who  often  indicated 
little  or  no  pre-screening  of  alternatives. 

MAUD  3  Sessions 

Computer  operation  instructions.  All  MAUD  3  sessions  were  monitored  by  the 
same  experimenter  who  conducted  the  induction  interviews  and  collected 
all  judgments  outside  of  the  MAUA  interactions.  After  providing  pre- 
MAUD  3  holistic  ratings  of  the  alternatives,  subjects  were  led  to  a 
separate  room  near  that  of  the  experimenter.  Subjects  were  seated  at 
a  desk,  on  top  of  which  sat  an  IBM  S110  minicomputer,  an  IBM  5103 
printer,  and  a  cathode  ray  tube  (CRT)  monitor.  MAUD  3  was  pre-loaded 
into  the  computer  storage  before  subjects  arrived,  and  all  sessions 
began  with  the  MAUD  3  request  for  a  session  name.  Subjects  were  given 
a  standard  introduction  to  familiarise  them  with  the  keyboard,  CRT,  and  printer. 

Subjects  were  told  that  MAUD  3  would  eventually  ask  a  question 
beginning:  "Do  you  want  to  investigate  your  preferences?"  The  subject 
was  instructed  to  stop  when  that  question  appeared,  and  to  report  to  the 
experimenter's  office.  Subjects  were  told  also  that  the  experimenter 
would  be  available  in  his  office  prior  to  the  "stop"  question,  should 
there  be  a  problem,  but  that  he/she  should  attempt  to  communicate 
with  MAUD  3  without  additional  help. 
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Elicitation  of  attributes  and  single-dimension  values.  Details 
of  the  MAUD  3  assessment  can  be  found  in  Humphreys  and 
McFadden  (1980)  and.  Humphreys  and  Wisudha  (Note  4.).  Briefly, 
MAUD  3  begins  by  recursively  eliciting  attributes,  assessing  single¬ 
dimension  value  functions,  and  checking  for  correlations  between  pairs 
of  value  dimensions.  Endpoint  descriptions  of  attributes  are  determined 
by  asking  how  triads  of  alternatives  differ,  or  by  asking  for  the  end¬ 
points  directly.  Single- attribute  value  functions  are  assessed  by 
placing  each  alternative  on  a  nine-point  rating  scale  (anchored  at  the 
elicited  endpoints) ,  determining  an  "ideal  point"  along  the  9-point 
range,  and  normalicing  under  an  assumption  of  piece-wise  linearity. 

When  significant  correlations  between  these  normalized  value  functions 
are  detected,  the  subject  is  given  an  opportunity  to  combine  the  two 
dimensions  under  a  single  heading;  otherwise  they  remain  in  the  analysis 
as  separate  attributes.  After  the  addition  of  each  attribute  (beyond 
the  first  three) ,  MAUD  3  allows  the  sub j ect  to  review  the  attribute 
descriptions,  single-attribute  value  ratings  and  ideal  points,  and 
normalized  single- attribute  values.  After  MAUD  3  finished  the  attribute 
elicitation,  the  experimenter  asked  whether  subjects  were  sure  that  all 
attributes  were  included.  .About  half  of  the  subjects  added  attributes 
at  that  point.  Subsequently,  subjects  performed  the  brlts  assessment 
of  scaling  parameters  (weights).  No  subject  wished  to  assume  that  the 
various  attributes  were  all  equally  important  in  determining  preference. 

■Assessment  of  weights.  MAUD  3  assessed  scaling  parameters  (inroortance 
weights)  under  an  assumption  of  additive  utility  independence  using  a 
version  of  the  basic  reference  lottery  tickets  (brlts)  procedure.  (For 
details,  see  Hurrphreys  and  Wisudha.Note  4;  for  more  on  brlts,  see  Keeney 


and  Raiffa,  19 "6).  For  n  attributes ,  MAUD  3  presents  n-1  brlts  questions, 
consisting  of  a  choice  between  a  "moderate"  sure  thing  (alternative  best 
on  one  dimension  and  worst  on  one  dimension)  and  a  gamble  in  which  an 
"excellent"  outcome  (alternative  best  on  both  dimensions)  results  with 
probability  p,  and  a  "poor”  outcome  (alternative  worst  on  both  dimen¬ 
sions)  results  with  probability  1-p.  The  two  dimensions  chosen  for 
each  brlts  question  are  determined  by  the  correlational  structure  of 
the  normalized  single- attribute  values  and  earlier  brlts  judgments. 

The  algorithm  is  designed  to  include  every  attribute  in  at  least  one 
brlts  question,  and  to  include  more  important  attributes  in  more  brlts 
questions  than  less  important  attributes.  .An  attempt  is  made  to  select 
early  attribute  pairs  that  are  positively  correlated,  thus  creating 
easily  imagined  alternatives  in  the  gamble,  but  not  in  the  sure  thing. 

Later  attribute  pairs,  more  critical  to  the  weight  assessment,  are 
selected  so  as  to  bear  as  little  statistical  association  as  possible. 

Observation  of  pilot  subjects  indicated  that  most  subjects  had 
trouble  understanding  the  brlts  question;  in  particular,  subjects 
often  became  confused  and  frustrated  at  trying  to  keep  so  many  pieces 
of  seemingly  unrelated  infoimation  in  mind  at  once.  Further  observation 
of  oilot  subjects  provided  with  the  above  instruction  revealed  that  there  was 
a  deeper  problem  inherent  in  the  brlts  question.  Soecificallv,  most 
subjects  always  switched  their  preference  to  the  sure  thing  for  values  of 
greater  than  .30  (usually  .70  or  .30)  regardless  of  the  attribute  pair. 

.As  a  remedy,  the  experimenter  explained  that  the  standing  of  the  sure 
thing  alternative  with  respect  to  the  gamble  outcomes  should  be  related 
to  the  relative  importance  of  the  two  varying  attributes .  The  form  and 
content  of  the  brlts  intervention  was  standardized  (as  much  as  possible, 
given  that  the  subject  was  allowed  to  ask  questions),  and  kept  as  brief 
as  possible.  .An  attempt  was  made  to  keeD  the  intervention  detached  from 
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the  flow  of  the  MAUD  interaction. 

Once  the  instructions  were  given,  the  experimenter  asked  the 
subject  to  report  to  his  office  after  the  last  brlts  question. 

The  experimenter  left  the  subject  alone  for  the  remainder  of  the 
MAUD  3  session.  Most  MAUD  3  sessions  lasted  between  1  and  2  hours. 
.Analyst  Sessions 

Five  different  analysts  were  utilized,  including  two  research 
faculty,  one  seventh-year  graduate  student,  and  two  first-year  graduate 
students.  None  of  the  analysts  had  more  than  cursory  experience  with 
applying  MAUA  for  personal  decision  problems,  and  the  two  first-year 
students  learned  of  MAU  ideas  only  a  few  weeks  before  their  involve¬ 
ment  in  the  study.  After  obtaining  pre-analyst  ratings,  the  experi¬ 
menter  introduced  the  subject  to  his/her  analyst;  this  assignment  was 
determined  largely  by  who  was  "in"  at  the  time.  All  analyst  MAUA 
sessions  were  carried  out  in  the  private  office  of  the  analyst,  with 
no  intervention  from  the  experimenter  of  any  kind.  Although 
details  of  the  procedure  varied  across  analysts,  and  even  across 
subjects  assigned  to  the  same  analyst,  all  sessions  were  similar  to 
Edwards's  (1972,  1977)  Simple  Multi-Attribute  Rating  Technique  (SMART). 

Like  MAUD  3,  the  analysts  determined  a  list  of  relevant  attributes, 
elicited  single- attribute  values  for  each  alternative,  and  assessed 
scaling  parameters  (weights) .  .Although  SMART  does  not  suggest  any 
specific  procedure  for  determining  relevant  dimensions,  retrospective 
discussions  with  analysts  indicated  that  all  had  used  one  or  more  of 
the  following  methods:  (1)  suggestion  of  a  particular  attribute; 

(2)  asking  the  "MAUD  3- like"  question  "How  do  these  alternatives  differ?" 

(3)  asking  "How  is  alternative  X  attractive?";  (4)  asking  the  subject 
to  find  one  aspect  on  which  each  and  every  alternative  is  attractive; 

(5)  asking  "What  attributes  do  you  want  to  consider?"  directly;  and 


(6)  asking  "What  factors  are  relevant  to  the  decision?".  A  distinction 
is  made  between  the  last  two  procedures  since  some  analysts  allowed 
the  subject  to  include  any  attribute  that  he/she  wanted,  whereas  other 
analysts  stressed  the  requiranent  of  relevancy,  thereby  screening 
out  "unimportant"  attributes  or  attributes  with  little  variability. 

Single- attribute  values  were  elicited  using  some  version  of  the 
SMART  procedure,  using  0-100  rating  scales  for  eliciting  single  attri¬ 
bute  utilities  and  ratio  procedures  for  weight  assessments.  Sane 
analysts  specifically  called  the  subject's  attention  to  the  problem 
of  attribute  ranges,  explaining  that  an  attribute  with  a  restricted 
range  among  the  alternatives  at  hand  should  receive  less  weight  than 
might  be  the  case  if  the  range  were  larger.  One  analvst  explained 
the  concept  of  attribute  importance  in  terms  of  "how  much  one  would 
like  to  step  fron  the  worst  available  level  of  the  attribute  to  the 
best  available  level";  subjects  were  told  to  reflect  the  desirability 
of  this  increment  in  their  direct  subjective  estimates  of  weight. 

This  weight  assessment  question  is  similar  to  that  used  in  the  so-called 
"swing-weight"  elicitation  technique. 

When  the  session  was  completed,  the  analyst  led  the  subject  back 
to  the  experimenter's  office.  Most  analyst  sessions  lasted  frcm 
one  to  two  hours. 

.Analyst  Evaluations  of  Attributes 

Three  of  the  five  analysts  (two  research  faculty  and  one  fiTSt-year 
graduate  student)  evaluated  all  70  attribute  sets  (33  subjects  X  2  analyses^ 
in  terms  of  (1)  completeness,  (2)  logical  independence,  (S')  value  inde¬ 
pendence,  and  (1)  "overall  global  quality".  Each  subject's  pair  of 
attribute  sets  was  presented  along  with  a  generic  name  for  the  four 
or  more  alternatives  evaluated.  The  experimenter  abstracted  attribute 
names  fran  the  endpoint  labels  for  MAUD  3  attributes.  All  analyst  judgments 
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were  collected  blind,  as  analysts  did  not  know  which  attribute  set  resulted 
from  the  MAUD  3  and  which  from  the  analyst  session,  nor  did  thev  know  the 
subject/analyst  source  of  individual  attribute  sets. 

All  analysts  were  given  written  instructions  defining  completeness, 
logical  independence,  and  value  independence.  No  explanation  was  given 
for  "overall  global  quality".  In  addition,  the  experimenter  met  with 
the  analysts  in  a  group  to  discuss  the  definitions  via  several  examples 
and  to  answer  questions.  The  three  analysts  made  their  judgments 
independently  over  a  period  of  several  days  following  the  meeting. 

For  each  subject's  pair  of  attribute  sets  (MAUD  3  and  analyst  elicited), 
analysts  made  an  ordinal  judgment  as  to  which  more  nearly  captured  the 
relevant  aspects  of  the  generic  evaluation  problem  (completeness) , 
and  assigned  a  number  reflecting  the  ratio  of  the  number  of  aspects 
covered  by  the  more  complete  attribute  set  to  the  nunber  covered  by 
the  less  canplete  set.  Logical  independence  and  value  independence 
were  judged  on  100  point  rating  scales,  each  anchored  by  the  attribute 
set  of  the  70  judged  least  independent  (assigned  a  0)  and  the  attribute 
set  of  the  70  judged  most  independent  (assigned  a  99) .  .An  attribute  set 
is  considered  to  be  logically  independent  if  the  attribute  labels  do 
not  mean  the  same  things  semantically.  .An  attribute  set  is  value  inde¬ 
pendent  if  the  value  of  an  alternative  on  one  attribute  is  not  influenced 
by  the  alternative’s  value  on  another  alternative,  for  all  pairs  of 
attributes.  Logical  independence  is  much  weaker  than  value  indeoendence , 
since  value  independence  implies  logical  independence,  but  the  reverse 
is  not  true.  Logical  non- independence  is  one  form  of  overlap,  leading 
to  so-called  "double  counting";  all  such  instances  are  examples  of 
value  non- independence .  But  value  non- independence  may  arise  from 
other  causes  as  well.  Overall  judgments  of  global  quality  were  also 
made  on  a  100  point  rating  scale,  anchored  by  the  "worst"  attribute  set 
of  the  "0  (assigned  a  0)  and  the  "best"  attribute  set  of  the  "0  (assigned  a  .99). 
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RESULTS 

'\'e  present  three  kinds  of  data  analyses.  In  the  first  we 
examined  the  convergence  of  multiattribute  models  from  the  MAUD  3  and 
analyst  sessions  and  the  agreement  between  models  and  subjects'  holistic 
judgments.  The  second  analysis  compared  the  user  satisfaction  and 
acceptance  ratings  of  MAUD  3  and  analyst  sessions.  In  the  third 
analysis,  we  compared  the  sice  and  quality  of  the  attribute  sets 
generated  by  MAUD  3  and  the  analysts. 

Convergence 

To  study  convergence  we  calculated  each  subject's  multiattributc 
utilities  using  single  attribute  value  ratings  from  MAUD  3  and  analyst 
sessions,  coupled  with  either  assessed  weights,  or  equal  weights.  Overall, 
the  convergence  across  the  resulting  four  models  is  quite  encouraging. 

The  median  Pearson  product  correlation  between  multiattribute  utili¬ 
ties  of  MAUD  3  and  analysts  (using  assessed  weights)  was  .63.  For  54% 
of  the  subjects  the  analyst  and  MAUD  3  assigned  the  highest  utility 
to  the  same  option.  Using  equal  weights  for  both  the  analyst  and 
MAUD  3  increases  this  convergence  somewhat  (median  product  moment 
correlation  of  .71,  with  65%  matching  highest  utility  option). 

Another  convergence  measure  was  the  correlation  between  subjects' 
holistic  ratings  of  the  options  and  the  multiattribute  utilities 
calculated  from  the  models.  Table  2  shows  the  median  Pearson  correlations 
(ranging  fran  .50  to  .38),  condi tionaliced  on  whether  subjects  first  inter¬ 
acted  with  MAUD  3  (top  half)  or  the  analyst  (bottom  half) .  Although  dif¬ 
ferences  are  obviously  small,  three  minor  trends  are  suggested.  First, 
assessed  weights  had  a  higher  correlation  with  holistic  ratings  than  did 
equal  weights  in  13  out  of  the  16  possible  comparisons.  Secondlv, 
all  3  sets  of  holistic  ratings  appear  somewhat  more  consistent  with 


Insert  Table  2  about  here 


TABLE  2 


Median  Pearson  Correlations  between 
Holistic  Ratings  and  MAUA  Values 


MAUA  Values 

Weights 

Holistic  Rating  Fran: 

First 

Session 

Second  Session 

i 

X=24 

Pre-MAUD  3 

Post-MAUD  3 

Pre- analyst 

Post -analyst 

.Assessed 

.55 

.71 

.63 

.61 

MAUD  3 

Equal 

.67 

.63 

.59 

•  /  i 

Analyst 

Assessed 

.55 

.63 

.67 

.77 

Equal 

.58 

.  58 

.56 

.69 

N-ll 

Pre-analyst  Post-analyst 

Pre-MAUD  3 

Post-MAUD  3 

.Analyst 

Assessed 

.84 

.80 

.79 

.76 

Equal 

.  50 

.60 

.61 

.62 

MAUD  3 

Assessed 

rJ 

oo 

• 

.30 

.  83 

.88 

Equal 

.73 

.69 

.74 

.79 
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the  model  chat  either  directly  preceded  or  followed  the  rating. 

Thirdly,  holistic  ratings  tended  to  drift  towards  closer  agreement 
with  the  multiattribute  utilities  as  the  sessions  progressed.  In 
six  of  the  eight  cases ,  multiattribute  models  correlated  more 
highly  with  the  final  holistic  ratings  than  with  any  of  the  re¬ 
maining  three  holistic  ratings  sets. 

Table  5  shows  the  convergence  of  multi attribute  models  with 
holistic  ratings  in  terns  of  the  proportions  of  subjects  whose 
ratings  and  models  agreed  on  the  most  preferred  option.  Although 
the  differences  are  small,  there  is  a  tendency  for  analyst  derived 
utilities  to  match  both  the  last  assessed  and  the  most  preferred 
holistic  ratings  more  closely  than  does  MAUD  3.  However,  a 
comparison  between  subjects  who  received  MAUD  3  first  with  those 
who  interacted  with  the  analyst  first  shows  that  MAUD  3  utilities 
agreed  more  closely  with  the  holistic  ratings  that  directly  followed 
the  first  session  than  did  the  analyst's  utilities  (63%  vs.  36% 
for  assessed  weights) . 

Insert  Table  3  about  here 

A  final  measure  of  convergence  were  the  subjects'  blind  rankings 
of  the  sets  of  ratings  produced  by  holistic  judgments  and  the  models. 

22%  of  the  subjects  indicated  more  agreement  with  the  analyst's  utilities 
than  with  holistic  ratings  generated  either  before  or  after  the  analvst 
session.  The  same  was  true  of  only  12%  of  the  subjects  in  the  MAUD  3 
sessions.  When  forced  to  choose  between  MAUD  3  utilities  and  analvst 
utilities  (with  assessed  we ihgts) ,  a  slim  majority  (38%)  indicated  more 
agreement  with  the  analyst's  results. 

In  summary ,  we  found  moderately  encouraging  convergence  across 


TABLE  3 


models  and  between  models  and  holistic  judgments.  There  was  no 
clear  "winner"  in  the  comparison  of  MAUD  3  and  analyst's  convergence 
with  holistic  judgments .  Many  of  the  subtle  convergence  trends 
appeared  to  be  due  to  ordering  of  sessions  and/or  to  the  temporal 
proximity  of  holistic  judgments  to  the  respective  modeling  activity. 
User  Satisfaction  and  Acceptance  of  MAUA 

Mext  we  analysed  subjects'  expressed  satisfaction  and  accep¬ 
tance  of  the  MAUD  3  vs.  analyst  sessions.  The  proportion  of  subjects 
rating  MAUD  3  higher  than  the  analyst,  and  vice  versa,  are  displaved 
in  Table  4  separately  for  males  and  females. 

Insert  Table  4  about  here 


Females  overwhe lining ly  indicated  a  desire  to  use  the  analyst  rather 
than  MAUD  3  in  future  decisions  and  confidence  that  the  analyst 
rather  than  MAUD  3  recommended  the  best  option.  Furthermore, 
females  found  the  analyst  interaction  to  be  more  helpful,  more 
comfortable,  and  more  effective  in  discovering  new  aspects  of 
the  problem  than  MAUD  5.  Contrarily,  males  were  split  roughly 
evenly  on  the  issue  of  whetheT  MAUD  3  or  the  analyst  brought  out 
more  new  aspects  of  the  problem  or  "helped"  to  solve  the  problem. 

In  addition,  males  indicated  a  preference  to  use  MAUD  3  rather  than  an 
analyst  for  future  decision,  despite  strong  agreement  with  the 
females  that  analyst  interactions  are  more  comfortable  and  con¬ 
fidence  that  the  analyst  recomnendation  is  more  likely  to  be  the 
"best"  option.  Overall,  males  and  females  rated  both  MAUD  3  and 
analyst  sessions  quite  high  with  respect  to  all  five  self-report  ouestions. 
Quality  and  Size  of  Attribute  Set 

Data  on  the  relative  sice  and  quality  of  'IAUD  3  and  analyst 
elicited  attribute  sets  is  presented  in  Table  5.  Percentages  for  the 
sice  of  attributes  were  based  on  a  simple  count. 
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TABLE  4 

Subjects'  Impressions  of  MAUA  Sessions 


: 

Sex 

Question 

Male  ( 

>15) 

Female  ( 

>22) 

M  >  A 

A  >  M 

M  >  A 

A  >  M 

Brought  out  new  aspects? 

39% 

39% 

18% 

64% 

Felt  comfortable? 

S* 

46% 

23% 

46% 

Helped  to  solve  problem? 

31% 

39% 

18% 

68% 

Would  trust  to  find 
best  alternative? 

18% 

69% 

18% 

50% 

Would  use  again? 

46% 

15% 

14% 

64% 

t 


TABLE  4 

Subjects'  Impressions  of  MAUA  Sessions 


Sex 

Question 

Male 

CN- 13) 

Female 

cm 

CM 

II 

v _ • 

M  >  A 

A  >  M 

M  >  A 

A  >  M 

Brought  out  new  aspects? 

59% 

39% 

18% 

64% 

Felt  comfortable? 

8% 

46% 

23% 

46% 

Helped  to  solve  problem? 

31% 

39% 

18% 

68% 

Would  trust  to  find 
best  alternative? 

18% 

69% 

18% 

50% 

Would  use  again? 

461 

15% 

14% 

64% 

Note:  The  proportion  of  subjects  rating  MAUD  3  higher  than  analyst  (M>A) 
and  the  proportion  rating  the  analyst  higher  than  MAUD  3  (A  >  M)  sum 
to  less  than  one,  since  some  subjects  assigned  equal  ratings. 


I 

t 


I 

i 
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Percentages  for  completeness,  independence,  and  quality  were  generated  as 

follows.  For  each  subject  and  each  criterion,  the  ratings  were  scored  as 

favoring  MAUD  3  if  all  three  raters  gave  MAUD  3  attributes  a 

higher  rating  or  if  two  gave  a  higher  rating  and  one  rated  MAUD  3 

and  the  analyst  the  same.  The  ratings  were  counted  as  favoring 

the  analyst  if  the  consensus  was  in  the  analyst's  favor.  The 

middle  column  shows  the  percentage  of  cases  in  which  no  such 

decision  could  be  made  and  therefore  indicates  the  amount  of 

rater  disagreement. 

As  the  first  row  of  Table  S  indicates,  analysts  elicited  more 
attributes  than  MAUD  3  for  the  majority  of  subjects,  particularly 
when  the  analyst  interaction  occurred  first.  There  was  almost 
perfect  rater  agreement  on  which  attribute  set  was  more  complete, 
somewhat  less  consensus  on  the  issue  of  independence,  and  sub¬ 
stantial  disagreement  in  judgments  of  overall  quality.  .Analyst 
attribute  sets  were  more  often  judged  more  complete  than  MAUD  3 
sets,  especially  if  the  analyst  session  preceded  the  MAUD  3  session. 

Both  logical  and  value  independence  depended  upon  the  order  of 
the  MAUA  sessions.  The  MAUD  3  attributes  were  more  often  judged 
more  independent  for  subjects  exposed  to  the  analyst  interaction 
first,  but  subjects  interacting  with  MAUD  3  prior  to  an  analyst 
were  split  about  evenly  as  to  attribute  independence.  Finally, 
judgments  of  overall  attribute  quality  heavily  favored  analyst 
elicitations,  regardless  of  the  session  order. 

The  attribute  set  features  considered  in  Table  5  were  highly 
related.  In  particular,  the  attribute  set  of  each  pair  with  the 


-26- 


TABLE  5 

Site  and  Quality  of  Attribute  Sets* 


Attribute  Set  Features 

Session  Order 

MAUD  3 

first 

(N=24) 

.Analyst 

:  first  (N=ll) 

M  >  A 

O 

A  >  M 

M  >  A 

9 

A  >  M 

#  of  Attributes 

221 

17% 

61% 

9% 

9% 

82% 

Completeness 

22% 

17% 

61% 

9% 

0% 

91% 

Logical  Independence 

26% 

39% 

35% 

64% 

u  i  0 

9% 

Value  Independence 

39% 

26% 

35% 

64% 

27% 

9% 

Overall  Quality 

13% 

43% 

44% 

18% 

36% 

46% 

*The  column  (?)  gives  the  percentages  of  cases  in  which  raters 
disagreed  or  in  which  MALTD  3  tied  the  analyst. 


_ Li 


-27- 


greater  number  of  dimensions  was  overwhelmingly  chosen  as  being 
more  complete,  less  independent,  and  of  higher  quality  than  the 
other  attribute  set.  Ordinal  judgments  of  overall  quality  were 
virtually  identical  to  those  of  completeness. 


DISCUSSION  AND  CONCLUSIONS 

MAUD  3  and  the  analyst  sessions  produced  highly  convergent 
multiattribute  utilities.  This  finding  is  consistent  with 
Fryback,  Gustafson,  and  Rose's  (Note  7)  report  of  MAUAs  for 
evaluating  the  severity  of  ischemic  heart  disease.  They  presented 
data  suggesting  that  MAUA  model  results  are  quite  insensitive  to 
widely  varying  problem  formulations  (i.e.,  attribute  structures), 
assessment  settings,  and  experts.  However,  we  did  find  that 
multiattribute  utilities  were  more  sensitive  to  the  problem 
structure  than  to  weight  parameter  assessments.  Differences  in 
MAUD  3  and  analyst  attributes  and  single-dimension  values,  and 
changes  in  subjects'  values  over  the  intervening  week,  accounted 
for  more  variation  in  model  values  than  importance  weights.  This 
result  confirms  many  empirical  and  analytical  findings  of  the 
insensitivity  of  multiattribute  utilities  to  weights. 

Subject's  agreement  with  the  MAUD  3  and  analyst  utilities 
differed  little  across  session  orders,  problem  types,  analysts, 
or  subject  sex  and  race.  Assessed  weights  produced  multiattribute 
utilities  in  more  agreement  with  holistic  judgments  than  did  equal 
weights.  Holistic  ratings  tended  to  agree  with  the  most  con¬ 
tiguous  model  values;  repeated  holistic  ratings  tended  to  converge 
toward  agreement  with  utilities  calculated  frcrn  the  models.  Unfortu¬ 
nately,  it  is  somewhat  difficult  to  interpret  convergence  or  the  lack 
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o£  it  directly  as  an  indicator  of  the  quality  of  the  analysis.  Low 
convergence  could  mean  that  the  analysis  has  totally  gone  awry,  or 
it  could  be  indicative  of  a  deeper,  more  valid  evaluation  than 
the  subject  is  capable  of  in  his/her  own  holistic  ratings. 

Our  subjects  became  quite  involved  in  both  MAUD  and  analyst 
sessions.  Subjective  ratings  of  both  sessions  were  greatly  skewed 
toward  the  high  end.  Subjects  were  highly  motivated,  and  their 
responses  seemed  more  thoughtful  and  considered  than  is  our  experi¬ 
ence  with  thought  experiments  employing  hypothetical  scenarios , 
typical  of  laboratory  experiments  with  college  subjects. 

Our  sex  differences  with  respect  to  user  satisfaction  are 
curious.  One  interpretation  is  that  our  male  subjects  possibly 
had  more  experience  with  or  aptitude  for  computer- like  tasks.  Our 
impression  of  the  subjects,  based  on  an  admittedly  brief  experience, 
does  not  support  this  hypothesis,  however.  Yet  another  explanation 
lies  in  a  possible  analyst  sex/«ubject  sex  interaction  effect; 
all  but  one  of  the  analysts  was  male.  Future  experiments  should 
certainly  better  counterbalance  for  the  sex  of  both  subject  and 
analyst. 

The  median  nunber  of  attributes  elicited  was  greater  for 
analyst  sessions  (7.5)than  for  MAUD  3  sessions  (5.9);  however,  one 
analyst  averaged  10  attributes  per  session,  while  another  averaged 
only  a  little  over  5.  The  10-attribute  analyst  was  rated  higher 
than  the  other  four  analysts  in  terms  of  subjects'  impressions  of 
the  session,  but  received  the  lowest  amount  of  acceptance  of  the 
resulting  alternative  orderings.  The  five- attribute  analyst, 
however,  received  the  lowest  subjective  ratings  of  all,  but 
achieved  the  greatest  degree  of  acceptance  of  final  alternative 
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orderings.  Our  findings  seem  to  indicate  that  subjects  feel  better 
taken  care  of  when  more  attributes  are  included  in  the  analysis,  but 
that  subjects'  holistic  ratings  are  better  accounted  for  by  analyses 
with  smaller  rather  than  larger  numbers  of  attributes. 

Our  findings  regarding  the  size  and  quality  of  attribute  sets 
suggest  that  our  analysts'  notions  about  attribute  elicitation  are 
much  like  those  of  our  subjects:  the  more  the  better.  Although 
MAUD  3  elicited  smaller,  less  "complete"  attribute  sets,  they  were 
judged  to  be  more  independent,  both  logically  and  valuewise.  This 
result  is  presumably  due,  at  least  in  part,  to  the  effectiveness  of 
the  MAUD  3  mechanism  for  identifying  statistically  related  attributes 
and  presenting  them  for  combination  under  a  single  heading. 

Of  course,  our  findings  cannot  be  interpreted  in  a  vacuum.  Proper 
consideration  should  be  given  to  the  subject  population,  problem  types, 
analyst  experience  and  method  (SMART) ,  and  the  particular  MAUA  software 
we  employed  (MAUD  3) .  In  particular,  we  should  coiment  on  the  peculiari¬ 
ties  of  the  MAUD  3  program.  We  found  that  MAUD  3  is  not  truly 
"stand  alone".  Many  of  our  subjects  asked  for  assistance  in  the 
attribute  elicitation  phase  of  the  program.  Typical  mistakes  included: 
repetition  of  attributes  (up  to  IS  times) ;  including  more  than  one 
attribute  in  a  given  attribute  definition;  and  thinking  about  other 
attributes  when  specifying  the  "ideal  point”  and/or  scale  values  on 
an  attribute.  MAUD  3  should  give  the  subject  more  information  con¬ 
cerning  attribute  elicitation,  as  the  "difference  questions"  are  simply 
too  abstract  and  nondirective. 


We  also  found  that  very  few  subjects  were  able  to  answer  the 
brlt  question  properly.  Most  subjects  had  initial  difficulties 
understanding  this  question,  and  even  after  careful  instructions 
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they  experienced  some  problem  keeping  track  of  the  different  Dieces 
of  information  that  constitute  brlts.  There  also  appeared  to  be  a 
response  bias  that  is  built  into  the  sequencing  of  the  brlt  questions. 
That  sequence  reduces  the  attractiveness  of  the  gamble  until  the 
sure  thing  is  either  preferred  to  or  indifferent  to  the  gamble. 

Subjects,  inclined  to  stop  an  obviously  difficult  information  pro¬ 
cessing  task,  appeared  to  choose  the  sure  thing  even  before  they 
reached  their  indifference  point. 

In  spite  of  these  problems ,  the  computer  sessions  compared 
quite  favorably  with  the  analyst  sessions.  I his  general  result  is 
encouraging  for  those  who  see  the  future  of  decision  analysis  in 
computerized  and  possibly  stand-alone  decision  aids.  We  conceptualize 
the  development  of  computerized  decision  aids  in  a  3-dimensional 
framework:  (1)  The  extent  to  which  the  program  requires  the  services 
of  someone  knowledgeable  of  either  DA  or  the  operation  of  the  program; 

(2)  The  degree  to  which  available  problem  structures  are  organized,  into 
empty  DA  categories  vs.  orientation  toward  problem  specific  structures 
that  make  use  of  prototypical  features  generic  to  all  problems  of  a 
given  class;  and  (3)  The  complexity  and  data  base  availability  of  the 
modeling  approach.  (Buede,  Note  8,  calls  (3)  the  engineering  science- 
clinical  art  dimension  of  decision  aiding.) 

The  results  of  our  experiment  suggest  that  stand-alone  decision 

\ 

aids  are  fe'asible.  We  believe  that  many  of  the  issues  corresponding 
to  problem  structuring  and  option  invention  can  be  eliminated  by 
creating  generic  problem  structures,  complete  with  a  general  structure 
and  a  set  of  options  that  can  be  both  pruned  and  added  to.  We  feel 
that  user  satisfaction  was  largely  mediated  by  the  fact  that  our  analysts 
could  recommend  options  and  objectives  to  the  decision  maker  directly. 
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whereas  MAUD  3  could  not.  Perhaps  user  confidence  would  be  enhanced  further 
by  including  a  problem  related  data  base,  thus  allowing  the  subject  to 
employ  as  canplex  and  complete  a  model  of  the  choice  problem  as 
seems  desirable. 
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FOOTNOTES 

1.  All  too  often,  value  experiments  utilize  hypothetical  scenarios 
that  more  resemble  problem-solving  tasks,  inviting  the  subject 
to  play  a  "nunbers-game"  in  which  consistency  is  the  winning 
move.  Consequently,  many  experiments  that  employ  decision 
analytic  value  models  in  assessing  a  role-played  preference 
structure  never  ccme  close  to  any  evaluative  or  affective 
construct,  so  necessary  to  the  usual  notion  of  value. 
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